Optimization Based Tumor Classification from Microarray Gene Expression Data
نویسندگان
چکیده
BACKGROUND An important use of data obtained from microarray measurements is the classification of tumor types with respect to genes that are either up or down regulated in specific cancer types. A number of algorithms have been proposed to obtain such classifications. These algorithms usually require parameter optimization to obtain accurate results depending on the type of data. Additionally, it is highly critical to find an optimal set of markers among those up or down regulated genes that can be clinically utilized to build assays for the diagnosis or to follow progression of specific cancer types. In this paper, we employ a mixed integer programming based classification algorithm named hyper-box enclosure method (HBE) for the classification of some cancer types with a minimal set of predictor genes. This optimization based method which is a user friendly and efficient classifier may allow the clinicians to diagnose and follow progression of certain cancer types. METHODOLOGY/PRINCIPAL FINDINGS We apply HBE algorithm to some well known data sets such as leukemia, prostate cancer, diffuse large B-cell lymphoma (DLBCL), small round blue cell tumors (SRBCT) to find some predictor genes that can be utilized for diagnosis and prognosis in a robust manner with a high accuracy. Our approach does not require any modification or parameter optimization for each data set. Additionally, information gain attribute evaluator, relief attribute evaluator and correlation-based feature selection methods are employed for the gene selection. The results are compared with those from other studies and biological roles of selected genes in corresponding cancer type are described. CONCLUSIONS/SIGNIFICANCE The performance of our algorithm overall was better than the other algorithms reported in the literature and classifiers found in WEKA data-mining package. Since it does not require a parameter optimization and it performs consistently very high prediction rate on different type of data sets, HBE method is an effective and consistent tool for cancer type prediction with a small number of gene markers.
منابع مشابه
Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملClassification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملPrediction of blood cancer using leukemia gene expression data and sparsity-based gene selection methods
Background: DNA microarray is a useful technology that simultaneously assesses the expression of thousands of genes. It can be utilized for the detection of cancer types and cancer biomarkers. This study aimed to predict blood cancer using leukemia gene expression data and a robust ℓ2,p-norm sparsity-based gene selection method. Materials and Methods: In this descriptive study, the microarray ...
متن کاملSFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کاملIntegration and Reduction of Microarray Gene Expressions Using an Information Theory Approach
The DNA microarray is an important technique that allows researchers to analyze many gene expression data in parallel. Although the data can be more significant if they come out of separate experiments, one of the most challenging phases in the microarray context is the integration of separate expression level datasets that have gathered through different techniques. In this paper, we prese...
متن کامل